Expressive Singing Synthesis Based on Unit Selection for the Singing Synthesis Challenge 2016
نویسندگان
چکیده
Sample and statistically based singing synthesizers typically require a large amount of data for automatically generating expressive synthetic performances. In this paper we present a singing synthesizer that using two rather small databases is able to generate expressive synthesis from an input consisting of notes and lyrics. The system is based on unit selection and uses the Wide-Band Harmonic Sinusoidal Model for transforming samples. The first database focuses on expression and consists of less than 2 minutes of free expressive singing using solely vowels. The second one is the timbre database which for the English case consists of roughly 35 minutes of monotonic singing of a set of sentences, one syllable per beat. The synthesis is divided in two steps. First, an expressive vowel singing performance of the target song is generated using the expression database. Next, this performance is used as input control of the synthesis using the timbre database and the target lyrics. A selection of synthetic performances have been submitted to the Interspeech Singing Synthesis Challenge 2016, in which they are compared to other competing systems.
منابع مشابه
Optimal Unit Stitching in a Unit Selection Singing Synthesis System
Unit Selection based speech synthesis systems are currently the best performing, producing natural sounding speech with minimal CPU load. One of the important reasons behind their success is the amount of recordings that are now commonly used in synthesis applications. However, in the case of singing applications, it is quite hard for a database to cover a large phonetic space due to the relati...
متن کاملExpressive Control of Singing Voice Synthesis Using Musical Contexts and a Parametric F0 Model
Expressive singing voice synthesis requires an appropriate control of both prosodic and timbral aspects. While it is desirable to have an intuitive control over the expressive parameters, synthesis systems should be able to produce convincing results directly from a score. As countless interpretations of a same score are possible, the system should also target a particular singing style, which ...
متن کاملExpressive text-to-speech approaches
The core concern of this paper is the modelling and the tractability of expressiveness in natural voice synthesis. In the first part we quickly discuss the imponderable gap between natural and singing voice synthesis approaches. In the second part we outline a four level model and a corpus-based methodology in modelling expressive forms—an essential step towards expressive voice synthesis. We t...
متن کاملA style control technique for singing voice synthesis based on multiple-regression HSMM
This paper proposes a technique for controlling singing style in the HMM-based singing voice synthesis. A style control technique based on multiple regression HSMM (MRHSMM), which was originally proposed for the HMM-based expressive speech synthesis, is applied to the conventional technique. The idea of pitch adaptive training is introduced into the MRHSMM to improve the modeling accuracy of fu...
متن کاملEvaluation of Singing Synthesis: Methodology and Case Study with Concatenative and Performative Systems
The special session Singing Synthesis Challenge: Fill-In the Gap aims at comparative evaluation of singing synthesis systems. The task is to synthesize a new couplet for two popular songs. This paper address the methodology needed for quality assessment of singing synthesis systems and reports on a case study using 2 systems with a total of 6 different configurations. The two synthesis systems ...
متن کامل